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ABSTRACT 


Face verification focuses on the task of determining whether two face images 
belong to the same identity or not. For unrestricted faces in the wild, this is a 
very challenging task. Besides significant degradation due to images that have 
large variations in pose, illumination, expression, aging, and occlusions, it also 
suffers from large-scale ever-expanding data needed to perform one-to-many 
recognition task. In this paper, we propose a face verification method by 
learning face similarities using a Convolutional Neural Networks (ConvNet). 
Instead of extracting features from each face image separately, our ConvNet 
model jointly extracts relational visual features from two face images in 
comparison. We train four hybrid ConvNet models to learn how to distinguish 
similarities between the face pair of four different face portions and join them 
at top-layer classifier level. We use binary-class classifier at top-layer level to 
identify the similarity of face pairs which includes a conventional Multi-Layer 
Perceptron (MLP), Support Vector Machines (SVM), Native Bayes, and 
another ConvNet. There are 3 face pairing configurations discussed in this 
paper. Results from experiments using Labeled face in the Wild (LFW) and 
CelebA datasets indicate that our hybrid ConvNet increases the face 
verification accuracy by as much as 27% when compared to individual 
ConvNet approach. We also found that Lateral face pair configuration yields 
the best LFW test accuracy on a very strict test protocol without any face 
alignment using MLP as top-layer classifier at 87.89%, which on-par with the 
state-of-the-arts. We showed that our approach is more flexible in terms of 
inferencing the learned models on out-of-sample data by testing LFW and 
CelebA on either model. 
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1, INTRODUCTION 


Due to the advancement of deep learning, the quality of image detection and recognition has been 
increasing for the past five years [1-4]. This also benefits the field of face recognition, where the performance 
of face recognition has increased by a large margin [5-10]. The key challenges of face recognition in 
unconstrained environment are variations in poses, illuminations, expressions, ages, makeups, and occlusions. 
Face recognition task becomes inherently more difficult when faces to be recognized are acquired in the wild. 

Traditionally, existing methods generally address the face recognition problem in two subsequent 
steps namely (1) feature extraction and (2) recognition. In the feature extraction stage, a variation of hand- 
crafted features has been successfully used [11-14]. Although these works include learning-based feature 
extraction approaches, each feature are extracted individually and separated from each other, thus some 
important correlations between the two compared images have been lost at the feature extraction stage. 
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At the recognition stage, different type of classifiers can be used to either classify each face to its actual 
identity [15, 16], or just determine its similarity by some distance metrices [17, 18]. 

Current research in face recognition has produced a few outstanding novel framework or Deep 
Network architectures. DeepFace network involves more than 120 million parameters using several locally 
connected layers without weight sharing, rather than the standard convolutional layers [19]. FaceNet uses a 
deep convolutional network trained to directly optimize the face embedding itself, rather than an intermediate 
bottleneck layer as in previous deep learning approaches [20]. The benefit of FaceNet is much greater 
representational efficiency. Liu et al. proposed a two-stage approach that combines a multi-patch deep CNN 
and deep metric learning, which extracts low dimensional but very discriminative features for face verification 
and recognition [21]. DeepID3 architectures are rebuilt from stacked convolution and inception layers proposed 
in VGG net and GoogLeNet to make them suitable to face recognition. Joint face identification-verification 
supervisory signals are added to both intermediate and final feature extraction layers during training [22]. More 
recently, Lu et al. proposed a method based on two deep convolutional neural networks (CNN) for face 
verification and make use of identification signals to supervise one CNN and the combination of semi- 
verification and identification to train the other one [10]. 

Nevertheless, there are often misconceptions and misunderstandings on the terms face recognition, 
face verification, and face identification. Face recognition is a general topic in the field of pattern recognition, 
which includes both face identification and face verification (sometimes also referred to as authentication). On 
one hand, face identification is concerned on determining the identity of a person based on the image of the 
person (client) against all known (labelled) images in databases (galleries). It is basically an answer to the 
question of “who this person is?”. This is also known as one-to-many matching. On the other hand, face 
verification is focusing on validating a claimed identity based on the image of a client, by comparing the client 
against a registered image from gallery, whose identity is not necessarily known or labelled. The result of face 
verification is either accepting or rejecting the claimed identity. This is also known as one-to-one matching. 
These terminologies are illustrated in Figure 1. In Figure 1, we also show that face verification can be applied 
as face identification by computing the confidence level, with condition that the label for galleries are known. 

One of disadvantage of face verification 1s good generalization is harder to achieve compared to face 
identification. However, there are several advantages of face verification employing deep neural networks 
which includes (1) No retraining required once the network generalize well within scope of training data, (2) 
Size of network and training data does not change much even when adding new labelled images into galleries 
for network inference stage, and (3) Can be easily extended to satisfy one-to-many face recognition by use of 
confidence value. On the other hand, face identification’s advantage is it is a relatively simpler approach and 
good generalization is also relatively easier to achieve. However, there are several disadvantages of face 
identification deployed in deep neural network environment, including (1) Size of network and training data 
expands proportional to number of labelled identities and images in galleries, (2) A trained network need to be 
re-trained when adding new labelled identities and images, and (3) Can be applied to verification case, 
however, it is highly prone to False Acceptance of Impostor (person not registered/authorized) and False 
Rejection of Client (registered/authorized person). 
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Figure 1. Face Verification and Face Identification terminologies illustrated. Shown together is the 
implementation of verification as part of Face Identification 
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Thus, to reduce the hassle of retraining the whole network each time new samples are added, we use 
face verification approach since it provides more flexibility to the network and database expansion does not 
require network retraining. It can also further facilitate one-to-many face identification which can be performed 
simply by taking into consideration the matching identities and their confidence level. The main contribution 
of this paper is we outline the method to construct 4 different ConvNet which is used to learn similarity of pair 
of images. These hybrid ConvNet are combined at top-layer level and classified using binary-classifiers. We 
show the performance of MLP, SVM, Native Bayes and ConvNet classifier in determining whether a pair of 
images can be considered sharing the same identity or not. We tested 3 different configurations of how the 
image can be paired together. The paper is organized as follows. Section 2 presents our proposed methods. 
Section 3 discusses the results obtained in several experiments and Section 4 concludes the paper. 


2. RESEARCH METHOD 

Previously, hybrid ConvNet approach has been used in [5] where they used 12 ConvNets with 8 
combinations of image pairs, arranged in layered configurations. The classifier used was Restricted Boltzmann 
Machine (RBM). In this work, we propose a much more compact ConvNet with only 8 image pair combinations 
modeled by 4 separate ConvNets, while assessing the performance of 3 different image pair configurations. 
The elaboration on each pairing configuration is given in following subsection. Each ConvNet in this work is 
constructed using the same layer configurations, where the map numbers and dimensions of the input layer and 
all the convolutional and max-pooling layers are shown in Table 1. 

As shown in Table 1, the total layers for each ConvNet is 20 layers, where the first layer 1s image 
input layer, taking RGB pair images having l width and J height as inputs (varies according to the size of pair 
image). There are 7 2D convolutional layers in total (followed by max-pooling layers) which extract the 
relational features from image pair hierarchically. The map size of each convolutional layers varies from 8 to 
512, in ascending order of the hierarchy. Finally, the extracted features pass 3 fully connected layers and are 
fully connected to 2 neurons in softmax layer which will give the probability of whether the pair image belong 
to the same person. In this work, we use similar size for Max-Pooling layers which is 2 X 2 with stride size of 
2. For faster computation of activation, instead of sigmoidal activation function, we use a relatively simpler 
ReLu activation function which is defined as f(x) = max (x, 0). 


Table 1. The detailed parameters of Convolution and Pooling layers. 


Layer Type Size / Layer Type Size / Layer Type Size / 
Stride Stride Stride 
1 Image Input ixjx3 8 2D 3x3x64 15 Max-Pooling 2xX2/2 
Convolution 
2 2D 3X3x8 Batch Normalization + ReLu Dropout 
Convolution 
Batch Normalization + ReLu 9 Max-Pooling 2x21 2 16 Fully Connected 100 
3 Max-Pooling 2x21 2 10 2D 3X3x 128 ReLu + Dropout 
Convolution 
4 2D 3x3x16 Batch Normalization + ReLu 17 Fully Connected 50 
Convolution 
Batch Normalization + ReLu 11 Max-Pooling 2Xx272 ReLu + Dropout 
5 Max-Pooling 2h21 2 12 2D 3X3xX 256 18 Fully Connected Z 
Convolution 
6 2D 3X3xX32 Batch Normalization + ReLu 19 Softmax 2 
Convolution 
Batch Normalization + ReLu 13 Max-Pooling 2X21 2 Top-L 
7 Max-Pooling  2x2/2 14 2D 3x3x 256 20 ae 2 
Classification 
Convolution 


Each ConvNet is trained using different bootstrap of training data according to their designated pair. 
When the size of the input regions changes in different ConvNet, the map sizes in the following layers of the 
ConvNets will change accordingly. To improve the generalization of the ConvNet, data augmentation is 
employed, where the training data is augmented with random image scaling and XY translations. The output 
from Softmax layer from all ConvNet are concatenated together to form the final high-level features of the 
learned face similarity. These features are fed into several types of classifiers to determine whether the learned 
features belong the same identity or not. The proposed architecture of this hybrid ConvNet is illustrated in 
Figure 2 while the architecture for each ConvNet is shown in Figure 3. 
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Figure 2. The proposed architecture of the hybrid ConvNet model 
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Figure 3. The proposed architecture of the each ConvNet model. For simplicity, only 2DConvolution, 
Fully Connected and Softmax layers are shown. 


As binary classifier, we use 4 binary-classifiers namely Multi-Layer Perceptron (MLP), Support 
Vector Machine, ConvNet and Naive Bayes classifiers. These top layer classifiers are used to classify the 
features from each ConvNet combined. The features of the ConvNet 1s 2x8 in dimension. Each feature vector 
belongs to each original face pair is further concatenated in this way, as shown in Figure 4. As a result of this 
concatenation, each feature vector now has 1x64 dimension for each original face pair. 
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Final Feature 





Vector 


64 
Figure 4. Concatenation of each ConvNet feature to form a final feature vector for each original face pair 


MLP classifier used in this work employs scaled conjugate gradient learning algorithm and has 100 
hidden neurons with 2 output neurons. SVM classifier and Gaussian is specified as the kernel. Another 
ConvNet is also used as classifier, where it has a layer of 2D convolution, 3 Max-Pooling layers, and 2 layers 
of fully connected layers having 50 and 2 neurons respectively. The Bayes classifier on the other hand uses 
Naive Bayes with bag-of-tokens model. 

We use three different image pairing configurations namely the Lateral configuration, Layered 
configuration, and Stack configuration. Layered configuration has been used previously where it produces 
good result [5]. The lateral configuration is constructed by simply combining pair of images horizontally side 
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by side. The layered configuration is made by taking the grayscale intensities of each image, and the average 
of the two grayscale intensities, and combining them in 3D layers. Let I,,q, be the grayscale of the first image 
and J gray be the grayscale of second image, the RGB layered image pair Prayereq 1s denoted as (1): 


I + J (1) 
Prayered (R,G,B) = Cara anus —_ 


Meanwhile, the stack configuration is built by combining the pair of images in stack fashion — where 
first image is vertically placed at the top of second image. These three configurations of image pairing schemes 
are shown in Figure 5. 





(a) (b) 
Figure 5. 3 face pair configurations used in this paper: (a) Lateral, (b) Layered, and (3) Stack 
configurations 


For those 4 hybrid ConvNet, each one of them will be used to model the similarity of image pair, 
constructed by 4 different portions of facial images. The portions are shown in Figure 6. In the meantime, there 
are 8 different combinations, denoted as M1 until M8, formed from each different portion of face image. 
The 8 different arrangement of combinations of image pairs used in this work (M1 — M8) are shown in 
Figure 7. We will also examine the performance of each ConvNet modelling similarity from each face portions 
to determine the most discriminatively suitable portion of face for face verification. 








ConvNet 1 ConvNet 2 
—— 
ConvNet 3 ConvNet 4 
Figure 6. 4 face portions used for each Figure 7. 8 different arrangement of combinations of image 
ConvNet pairs (M1 — M8) formed from single original face pair are 


shown using lateral configurations. Images are flipped and 
combined to form these combinations 


The performance measures used in evaluating this proposed method is by computing the accuracy, 
True Positive Rate (TPR), False Positive (FPR) and Precision. TPR (also called the sensitivity, the recall, or 
probability of detection in some fields) measures the proportion of actual positives that are correctly identified 
as such (e.g., the percentage of image pair having similar identity who are correctly identified as ‘matched’). 
FPR on the other hand measures the proportion of pair of images having different identity incorrectly classified 
as having similar identity. TN is the number of pair of images having different identity correctly classified as 
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such, while FN is the number of image pair having different identity who are incorrectly identified as ‘non- 
matched’. Precision is rate of correctly classified matching identity from the whole set classified as having 
matching identity. The TPR can be computed from (2) while FPR can be calculated from (3). Precision and the 
overall accuracy can be defined as (4) and (5). 


TPR = TP/(TP +FN) (2) 
FPR = FP /(FP + TN) (3) 
Precision = TP / (TP + FP) (4) 
Accuracy = (TP +TN)/(TP + TN + FP + FN) (5) 


3. RESULTS AND ANALYSIS 

We evaluate our approach on the well-known LFW dataset containing 7,701 images of 4,281 subjects 
using the standard image restricted with no outside image protocol [23]. This protocol defines 3,000 positive 
pairs and 3,000 negative pairs in total and further splits them into 10 disjoint subsets for cross validation. Each 
subset contains 300 positive and 300 negative pairs. We did not perform any face alignment on the LFW 
dataset. We also use the CelebFaces Attributes Dataset (CelebA) face dataset which contains 202,599 face 
images of 10,177 identities (celebrities) collected from the Internet [24]. Following the standard evaluation 
protocol, images in CelebA and LFW are divided into training and test sets. Furthermore, people in CelebA 
and LFW are mutually exclusive while the identities in training and test sets are also strictly exclusive. From 
10,177 identities in CelebA, 5,901 identities having 2 or more images in the dataset are separated into training 
and test sets, following the outlined protocol [24]. Each of the 5,901 identities in training and test sets of CelebA 
are paired with another image with matching identity and non-matching identity. Thus, for each identity, it will 
have two pairs of images (matching and non-matching). In both LFW and CelebA datasets, 8 image 
combinations are further formed for each possible face pairs per each ConvNet, thus the total number of pairs 
in the dataset for each ConvNet is therefore expanded such as described in Table 2. Furthermore, for all four 
hybrid ConvNets, each will be trained on different face portions formed. In our approach, the training set is 
further randomly split into training and validation set with 70:30 proportions. We performed two experiments 
where the first experiment investigates the performance of our proposed method in LFW dataset and compares 
against some other methods. In second experiment, we perform face verification on CelebA dataset, and 
compare the performance of face verification on CelebA dataset using face similarity models trained on LFW 
dataset and vice versa. All experiments are carried out on a computer running on Intel 17-6700 CPU @ 3.40GHz 
with 16 GB of RAM and GTX 1060 as the main GPU. 


Table 2. Number of face pairs used for learning face similarities in each ConvNet 


Datasets # train pairs # test pairs Total 
Positive Negative Positive Negative Pairs 
Pairs Pairs Pairs Pairs 
LFW 86,400 86,400 9600 9600 192,000 


CelebA 39,232 39.252 7,976 7,976 94,416 


Table 3 shows the performance of face verification on LFW test set using Lateral, Layered and Stack 
pairing configuration respectively. Besides, performance of each ConvNet models are also shown as separate 
verification performance. According to the results, Lateral pairing configuration delivers the best accuracy, 
FPR, TPR and Precision compared to the other 2 pairing configurations. Lateral pairing configuration 
consistently outperforms Layered and Stack configuration, where it yields 0.879 accuracy using MLP 
classifier, outperforming others. MLP classifier also delivers best accuracy when compared against other 
classifiers, where it outperforms SVM best accuracy at 0.863, Bayes’ at 0.862 and ConvNet’s at 0.860. MLP 
also yields best FPR, where it produces only 0.137 FPR, the lowest compared against other classifiers. 
Performance comparison in terms of face verification accuracy between the top-layer classifiers are further 
shown in detail as bar plot in Figure 8&(a). 
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Table 3. Face verification test performance for LFW Dataset using Lateral pairing configuration 


Method Test Performance 


Accuracy TPR FPR Precision 
Lateral Layered Stack Lateral _ Layered Stack Lateral Layered Stack Lateral — Layered 


Stack 


ConvNet | 0.799 0.763 0.790 0.903 0.767 0.827 0.210 0.237 0.213 0.802 0.775 


ConvN 


Top- 


et 


Layer 


ConvNet 2 0.799 0.703 0.788 0.827 0.813 0.820 0.203 0.305 0.215 0.808 0.711 


ConvNet 3 0.763 0.642 0.574 0.850 0.607 0.590 0.242 0.356 0.427 0.771 0.658 
ConvNet 4 0.811 0.724 0.778 0.883 0.730 0.777 0.194 0.276 0.222 0.817 0.737 
MLP 0.879 0.790 0.837 0.873 0.790 0.843 0.137 0.210 0.170 0.927 0.883 
SVM 0.863 0.778 0.843 0.873 0.787 0.840 0.147 0.230 0.153 0.922 0.871 
Bayes 0.862 0.777 0.841 0.870 0.789 0.840 0.147 0.230 0.162 0.922 0.881 


0.798 
0.796 
0.589 
0.789 
0.908 
0.917 
0.909 
0.908 


ConvNet 0.860 0.773 0.827 0.900 0.803 0.820 0.180 0.257 0.167 0.905 0.858 


Based on Table 3, it is clear that ConvNet | consistently delivers the best individual ConvNet 
performance, while ConvNet 3 is the worst. The results also highlight that the top-classifier approach 
outperforms individual ConvNet performance. The best improvement in accuracy is achieved when comparing 
the MLP’s accuracy (0.879) against ConvNet 3 (0.763) in Lateral pairing scheme, MLP’s accuracy (0.790) 
against ConvNet 3 (0.642) in Layered scheme, and SVM’s accuracy (0.843) against ConvNet 3 (0.574) in 
Stack Scheme, where the improvements are around 11%, 15% and 27% respectively. Even when comparing 
between top classifiers’ performance against the best individual ConvNet, the improvements are 6%, 3% and 
5% respectively which is quite significant. These results point out that the hybrid ConvNet scheme is able to 
improve individual ConvNet’s performance by combining them at the classifier level. 
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Figure 8. (a) Face verification accuracy for all 3 different image pair configurations and (b) Receiver 
Operating Characteristic curve for MLP, SVM, Bayes and ConvNet Top-Layer classifier used in this paper 


Further investigation on the results is shown in Figure 8(b) as performance comparison between top- 
layer classifiers, measured by Receiver Operating Characteristics (ROC) curve. According to Figure 9, as we 
improve the TPR, SVM emerges as the better classifier compared to other classifiers, as it manages to suppress 
the rate of FPR from increasing further. Thus, if we look for better TPR to FPR performance, SVM is the best 
choice. SVM yields 0.99 TPR at 0.3 FPR, surpassing MLP at 0.96 TPR for the same FPR. 

Subsequently, the performance of our method is compared against several state-of-the-arts that uses 
similar restricted image with no outside label protocol as adopted in our work. According to Table 4, our 
method is on par with state of the arts such as Robust Statistical Frontalization [25], Spartans [26], and Eigen- 
PEP [27]. When compared against MRF-MLBP [28], our method delivers better results, approximately 9 % 
better than MRF-MLBP method. 

To show that our approach is more flexible in terms of inference of learned model on external data 
that does not require retraining when adding new out-of-sample images, we perform another experiment. We 
examine our proposed method on CelebA dataset, where face verification 1s performed based on the models 
learned from LFW dataset and vice versa. The results are shown in Table 5. From the results, the accuracy of 
face verification on CelebA dataset is just slightly affected by the use of LFW model. It decreases from 0.782 
to 0.750, however, the TPR is not changed. FPR on the other hand increases slightly too from 0.244 to 0.309. 
In LFW dataset, the accuracy decreases by 7% when using CelebA model, and similarly, the TPR is not affected 
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with FPR slightly increases from 0.137 to 0.280. The comparison between these models’ performance 1s given 
as ROC curve in Figure 9. Even though there is slight penalty in accuracy observed, they are still acceptable, 
and considering that this can remove the hassle of retraining the whole models, this approach 1s much more 
flexible and easier to be implemented. Provided that the training data is large enough to model the similarity 
commonly found in faces, we expect the performance would comparable in the case of out-of-sample 
inferencing. 
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Table 4. Face verification performance comparison against state-of-the arts on LFW dataset with Restricted 
Image protocol 


Method Accuracy 
MRF-MLBP 0.790 
Eigen-PEP 0.889 
Spartans 0.875 
Robust Statistical Frontalization 0.888 
Hybrid ConvNet 0.879 


Table 5. Face verification performance for LFW and CelebA Dataset using each other’s model 


(Dataset) — (Model) =Top-Layer Classifier Test Performance 

Accuracy TPR FPR ___ Precision 

MLP 0.782 0.809 0.244 0.864 

SVM 0.785 0.812 0.241 0.866 

ia Me Bayes 0.788 0.804 0.226 0.874 

ConvNet 0.782 0.832 0.266 0.854 

MLP 0.750 0.811 0.309 0,828 

SVM 0.749 0.799 0.300 0.832 

wee Bayes 0.748 0.787 0.289 ~—«0.837 

ConvNet 0.738 0.825 0.348 0.809 

MLP 0.879 0.873 0.137 0.927 

SVM 0.863 0.873 0.147 0.922 

Ee Bayes 0.862 0.870 0.147 0.922 

ConvNet 0.860 0.900 0.180 0.905 

MLP 0.808 0.896 0.280 0.852 

SVM 0.803 0.896 0.290 0.847 

Ree Bayes 0.800 0.896 0.296 0.843 

ConvNet 0.770 0.913 0.373 0.804 


4. CONCLUSION 

In this paper we propose a framework of our hybrid ConvNet approach to learn face similarity between 
image pairs for face verification. Face verification is favorable rather than traditional face identification since 
it can provide us with more flexibility in inferencing trained models on out-of-sample data. We train four 
individual ConvNet on specific face portions to learn their similarities and combine them into a feature vector 
at top-layer classifier. The model learns directly and jointly extracts relational visual features from face pairs 
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under the supervision of face identities. 3 different pairing schemes namely Lateral, Layered and Stack 
configurations are discussed, and 4 different classifiers are used to learn the high-level similarities. We use 
MLP, SVM, Bayes and ConvNet for this purpose. Based on results obtained, we showed that the hybrid 
ConvNet approach can improve the performance of individual ConvNet as much as 27%. Even the best 
performing individual ConvNet can still be improved by 3% when using the hybrid scheme proposed in this 
work. MLP classifier yield best performance of 0.8789 accuracy on LFW dataset, on par with several state-of- 
the arts implementing similar test protocol. We found that Lateral pairing scheme delivers the best performance 
compared to Layered and Stack schemes. We show that the learned model can be applied outside the dataset 
where the performance penalty is minimal while the implementation will be more flexible. Our proposed 
approach can be improved further by increasing the number of individual ConvNets and face portions which 
can enhance the inherent discriminative ability learned by similarity features further. Other classifier such as 
Joint Bayesian classifier which take the variance of intra and inter-identity into consideration can be used to 
improve the results further. Jointly training face similarities using verification and identification signals under 
hybrid architecture can also improve the overall results. 
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